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Abstract. In this paper, several modifications are introduced to the 
functional approximation method iterLap [2] to reduce the approxima¬ 
tion error, including stopping rule adjustment, proposal of new resid¬ 
ual function, starting point selection for numerical optimisation, scaling 
of Hessian matrix. Illustrative examples are also provided to show the 
trade-off between running time and accuracy of the original and modified 
methods. 


1 Introduction 

Compared to Monte Carlo methods, functional approximation does not have 
the asymptotic convergence property, e.g. even when a method is applied re¬ 
peatedly, the approximation may not converge to the target density at all. Still, 
these methods are extremely useful in a practical application where time is the 
most invaluable resource. Furthermore, functional approximations can be used 
to complement Monte Carlo methods (used as importance sampling or MCMC 
proposal functions). For an IS algorithm, this may produce samples appropri¬ 
ately in the target support and result in evenly weighted samples. A MCMC 
trajectory may move adaptively to the local correlation or escape a modal trap 
by using these approximations. 

Among all methods, Laplace approximation [5] is the simplest but still very 
useful. In general, this method approximates a target non-normalised density 
by a normal distribution. Laplace approximation has all the useful properties 
of a normal distribution, e.g. the normal form of the conditional and marginal 
distributions. 

As distributions tend to converge to normal form asymptotically, Laplace 
approximation is very efficient in these cases. However, for non-normal distri¬ 
bution, this approximation suffers from two shortcomings. Firstly, it only works 
well with a uni-modal function and ignores the other modes if the target density 
is multi-modal. Secondly, the normal distribution implies a linear correlation be¬ 
tween random variables and hence cannot accommodate non-linear dependency. 

Variational Bayes and expectation propagation [T] are alternative solu¬ 
tions. However, as these methods aim to minimise the Kullback-Leibler (KL) 
divergence over an approximated density of specific form, they are only efficient 


when the KL divergence can be evaluated in closed form. Furthermore, these so¬ 
lutions are not as plugin as Laplace approximation due to the problem-specific 
derivation of KL divergence. 

As an extension of the normal form in Laplace approximation, the mix¬ 
ture of normal distributions is a very flexible family of distributions, capable of 
accommodating multi-modal and non-linear functions. A numerical estimation 
procedure for the weights, means and variances of the mixture form was given 
in [4]. Still, there are issues with this algorithm. With a skewed uni-modal tar¬ 
get density, all the modes and precisions, will be almost same (the difference is 
caused by numerical computation) and the resulting approximation is symmetric 
(by normal distribution), not being able to reflect the skewness of the target. 
Furthermore, it may be wasteful to do multiple simultaneous mode optimisation 
by random starting points. 

Addressing these issues, Bornkamp [5] proposed the iterated Laplace ap¬ 
proximation (iterLap), correcting the approximation discrepancy iteratively by 
adding a new component at an appropriate location. 

Another way to approximate a target density by a Gaussian mixture is to use 
expectation maximisation (EM) algorithm [T]. Unlike above methods, the EM 
solution relies on samples of the target density instead of numerical evaluation. 
More complex solutions are also needed to address the issue of unknown number 
of Gaussian components [S]. 

In this work, we propose a modified version of iterLap which is a direct ap¬ 
proximation to numerical evaluation of a target density. It is shown by various 
experiments that this modified version is slower but provides smaller approxi¬ 
mation error, compared to the original iterLap solution. 

So, Section provides a description of original iterLap algorithm. Then, 
Section gives an overview and discussion over several modihcations of a new 
approximation. Next, we show various numerical examples to compare the trade¬ 
off of running time and approximation error of two algorithms in Section 
Finally, Section concludes this work. 


2 Iterated Laplace approximation (iterLap) 

The description of iterLap algorithm is given in Algorithm [21 [3] . 

To illustrate the performance difference between Laplace and iterLap approx¬ 
imations, Figure shows both approximations for the following target density 
of a random variable x = (xi, 0 : 2 ): 

qx{x) cx A^(xi|0, al = lQ^)N{x 2 \^J .2 = 0.03xi, cr| = I^). (7) 

In Figure iterLap approximation is much better in terms of adapting to the 
non-linear dependency between two variables. Still, the iterLap method can be 
further improved by some modifications, which will be discussed later. 


Algorithm 1 iterLap 


1. Find ni local minima and their corresponding Hessian matrices Qi of g{-) = 
— log((/j;(-)) (like the above algorithm). Let ric be the current number of com¬ 
ponents (initially, Uc = rii). The current approximation qna-,x{-) with unknown 
non-normalised weights is: 


gnc;x(a;) = (1) 

2. Assume that for each component i, there are location vectors Xg-^ij (j = 1 : n^) 

generated from the distribution is the length of vector x. Let X 

be a m X da; matrix comprising all the location vectors Xg-ij and the mean /ii 
(each row Xk,i-.d^ {k — 1 : m) is either Xg-^i^j or gi). Let y be a vector of length 
m comprising the values of the target density, evaluated for all locations in X-. 
Uk = qx{Xk,i:d^). Let Z be a m X nc matrix, satisfying: Zk,i = N{Xk,i:d^\gi,Qi). 
The weights w = can be obtained by using quadratic programming: 

w = a.rgmm[{y — Zb)'^ {y — Zb)], b>0. (2) 

b 

3. Define a residual function guai') with: 

z = q^{x) - qn„;x{x), 

— log(z) it z> Zi 

- log(exp( 2 ; - zi)zi) it z < zi 

A lower bound zi = le“^ is used in the R package iterLap [3]. Find a new compo¬ 
nent by minimising the residual function: 



(3) 

(4) 


g-u^+i = argmin{ 3 „^(a:)}, 

X 


Qric + l 


dx^ 




(5) 

( 6 ) 


The starting points of this optimisation step are chosen by checking the ratio 
qx{x)/qn,,-x{x). Points with large ratios are preferred (the locations where qn^-,x{x) 
underestimates qx{x)). 

4. Update the number of components: ric = ric + 1. It one of the following criteria is 
satisfied: 

— ric > ric-max where ric-,max is a predefined maximum number of components. 

— \qx{x) — qnc-,x{x)\ < (5 with a predefined error bound 5. 

— X]r=i does not improve (by comparing with previous sums of weights). 

— Cannot find a new component. 

then re-estimate the weights by step 2 and stop the algorithm. Otherwise, repeat 
steps 2 —>■ 4. 








(a) Laplace approximation (b) iterLap approximation 

Fig. 1: Laplace and iterLap approximations. The target density and approxima¬ 
tions are drawn by blue and black contours accordingly. The red crosses are the 
means of normal components. The iterLap approximation has 15 components. 


3 Modifying the iterLap method 

Based on Section]^ in this section, we propose several modifications to improve 
the performance of iterLap approximation. 


Stopping rule by the normalising constant 

As a reminder, iterLap approximates a non-normalised density qx{‘) at iteration 
ric by: 


Tic 

i=l 

where /ii, Qi are found by an optimisation procedure and non-normalised weights 
Wi are estimated by quadratic programming in Section By the above approx¬ 
imation, the normalising constant Q of q^i’) is estimated by: 


c 




= / qn 


.■x{x)dx = ^ 


Wi. 




(9) 


The constant represents the probability mass of the approximated density 
qn^-,x{x). Hence, in [2], the iterative process stops when does not improve 
any more, i.e. satisfying the following equation: 

|C„. - 0.5(C„.-1 + Cne-2)l/Cn. < (5c, (10) 


with a predefined threshold 5^. 


XX 




However, even though a new component N{x\^i,Qi) does not improve the 
estimated volume of qna;x{x), it may still correct the density and decrease the 
approximation error. Furthermore, a new component generates more explored 
points which may be useful in the optimisation and quadratic programming step. 
Hence, we remove this stopping criterion from the iterLap source code. 


Residual function 

At each iteration, iterLap [3] finds a new component by minimising a residual 
function ga-n^i^) with: 


z = qj;{x) - qn,-x{x), 

f-log(z) if z > Zi > 0 

I - log(exp(z - zi)zi) if z < Zi 


( 11 ) 

( 12 ) 


where z/ is a predefined positive lower bound. The new component’s mean and 
precision matrix are obtained by: 


gn,+i = argmin{go;ne(a:)}) 

X 

d’^Qa-nS^) 


Qn^ + l — 


dx'^ 


(13) 

(14) 


We will use the following example to illustrate and discuss the above step. 
Example 1. Consider a non-normalised density qx{‘)' 


qx{x) = exp ( - ;yy^(|a:| - 0.5)^ 


2 X 52 


+ ’ 


(15) 


where a+ = a if a > 0 and a+ = 0 if a < 0. 

At iteration 1, qxi‘) is approximated by qi-xi') with a single normal com¬ 
ponent and the residual function is The functions qx{‘), qi\xi‘) and 

exp (—ga;i(’)) are plotted in Figure]^ The maximum points of exp (—ga;i(-)) 
or equivalently, the minimum points of ga-pi'), are also shown, representing the 
next potential component means. 

Figure|^shows that there are two potential maximum points at inf and — inf, 
which are not good locations for new components at all because they do not have 
any effect on the approximation. So, these locations should be avoided to save 
computation time. 

From Equation [TT] and Figure it can be seen that iterLap prefers choosing 
the locations at which qx{x) is significantly underestimated by qn^;x{x))- The 
locations in the overestimated region (qxix) < qn^-x(x))) are ignored. However, 
by experimenting with several residual functions and examples, we find that 
adding new components in the overestimated region does improve the approxi¬ 
mation. Partially, this may be because the explored variable space is extended 
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Fig. 2: Example]^ the blue and black lines show the target density qx{‘) and the 
approximated density 5i;a:(’); the purple line is exp (—5a;i(’)) with red crosses as 
maximum points (the leftmost and rightmost crosses are actually inf and — inf); 
the lower bound zi = 0.05 is marked by dashed horizontal line. 


by a more lenient rule, which in turn improves the optimisation and quadratic 
programming steps. 

So, we propose a new residual function g-n^-rtix) with: 

z = qx{x)-qn,-x{x), (16) 

i-\og{z + e^) ifz>0 

\-[^ogi-z + e^) + a{log{qxix)) - lqmax-,x)]/il + a) if z < 0 ’ 

(17) 


where lqmax\x is the maximum log value of log {qx{x)) until the current iteration; 
a very small constant is used only for the positivity condition of the 

log operator; a > 0 is an optional coefficient which pulls the optimisation point 
of gb-nSx) to a location of high log density value, log {qx{x)). Figurej^illustrates 
the above residual function gn^-rt,{x) with a = 0 and a = 3. 


Select starting points for the optimisation 

Using different starting points for the optimisation of gb\n^{x) results in different 
component means. Bornkamp [2] uses the ratio qx{x)/qn^-,x{x) as the criterion of 
choosing the optimisation starting points. However, such a criterion may lead to 
a point located at the distribution tail. For example, with a t-distribution qx{x) 
and a normal distribution qn^-,x{x), the ratio is extremely large at the tail. 

So, we use the absolute difference \qx{x) — qn^-,x{x)\ as a selection criterion 
which is closely related to the residual function gb;n^ {x). As a reminder of Section 
iterLap keeps a m x dx matrix X of all explored locations, a vector y of 
target density values qxi‘) and a mx Uc matrix Z of component density values 









(a) a = 0 (b) a = 3 

Fig. 3: Example[^ the blue, black and purple lines show the target density qx{-), 
the approximated density 9i;a;(-) and the function exp (—5b;i(-)) with red crosses 
as maximum points of exp {—gb-i{'))- 


Qi) evaluated at the explored locations. The algorithm to select starting 
points Xs from X is shown in Algorithm 


Algorithm 2 Select optimisation starting points 


1. Let Xa = X. Remove from Xa points Xi with small target density values, i.e. 
satisfying the condition: log (ga:( 2 ;i)) — lqmax-,x < Siq 

2. Find a point Xj in Xa which maximises \qx{xj) — qna-,x{xj)\. Add that point Xj to 

A.. 

3. Remove from Xa points Xk close to Xj. 

4. Stop the algorithm if there are enough starting points in Xs or if Xa is empty. 
Otherwise, repeat steps 2 —>■ 4. 


A note on standardising qx{‘) and qx{‘) 

Usually, the comparison between two densities is done based on their normalised 
densities. However, aside from the difficulty of obtaining the normalising con¬ 
stant, there is another issue. It is almost impossible to have a ’’good” approx¬ 
imation at the tail by any method. The absolute difference \qx{,x) — qxi‘)\ at 
a specific location at the tail may be extremely small but in high dimension 
space, the sum of all the absolute differences in every direction of the tail may 
become significant. Consequently, even though qx{-) is a ’’good” approximation 
of qxi') locally, the normalised Pxi') is however a ’’poor” approximation of the 
normalised Pxi’)- 











Hence, instead of globally normalising qx{-) and qx{‘), we only standardise 
these two densities on a user-defined grid Xg. qx{ ) is evaluated for all grid points 
Xj and the standardised density rx(-) on the grid is obtained by: 


rx(xj) 


Ey qx{xj>y 


(18) 


The standardised rx{-) is obtained in a similar manner. So, local comparison of 
two densities can be done by analysing the standardised densities. There is one 
statistic s(rx,rx) that we find reasonable for the comparison purposes between 
two standardised densities: 


s{rx,rx) = ^ \rx{xi) - rx{xi)\. (19) 

XiGXg 

With the same grid Xg, s{rx,ra-x) can be compared with another s{rx,rb-x)- 


Scaling a component’s Hessian 

In iterLap, the Hessian of gb^nAx) at the mode is used as the component’s preci¬ 
sion matrix. Usually, the sharper the curvature of pb-n^ix)-, the larger the eigen¬ 
values of the Hessian matrix at the mode and the more focused the corresponding 
normal component. 

In some cases, the numerically evaluated Hessian matrix at the mode is 
sharper than the actual curvature and iterLap may fail to improve such a target 
density. In Example[^ the function log {—qx{x)) is a quadratic function near the 
mode a; = 0 but it becomes a cubic function with much sharper curvature when 
\x\ > 0.5. As a result, the approximated qi-^x{x) at the first iteration is much 
flatter than the target density, which is shown in Figure Unfortunately, in this 
example, even with more extra components, the iterLap approximation does not 
improve. 

There are two ways to get around this issue. Firstly, we can allow a man¬ 
ual scaling of the Hessian matrix. With a user-defined scale factor Ka, a new 
component is added with a precision matrix: 

^ _ d‘^ga,nAx) 



This modification is analogous to using kernel density estimation with a small 
bandwidth. A component with large precision matrix affects the approximation 
in a small region. 

With Ka = 1.5, the standardised densities for Examplej^are shown in Figure 
|4a[ Clearly, the approximation becomes much better. 

One potential problem with this manual scaling is that all precision matrices 
become larger and all corresponding components are sharper, affecting only small 
local regions centred at the means. Similar to the case of using a small bandwidth 
in kernel method, the approximation becomes wriggly and only gets smoother 
with a larger number of components. 





0.015- 



(b) Multiplicative scaling with Udup = 3 and 
(a) Ka = 1.5 with no multiplicative scaling. Kb = 1.25. Ka = 1. has ric = 12 compo- 

^nc;a:(') has 71^ = 14 Components nents 


Fig. 4: Example the iterLap approximation with hessian scaling and multi¬ 
plicative scaling; the blue and black curves are the standardised densities rx{-) 
and Tn^-xi’)- 


In a second attempt, we try to only adjust the Hessian matrix when needed. 
In Figure]^ one of the potential means for the next components is fj, = 0 (by the 
optimisation of As ^ = 0 is already a component mean of the approxi¬ 

mated qj;a;(-), iterLap will not accept /r = 0 as a mean of a new component and 
try to find other locations. Also, even if /i = 0 is accepted, a new component 
with mean ^ = 0 and the usual precision matrix will not make any difference. 
So, we make the following modifications: 

— Allow the duplication of component means but there are no more than Udup 
duplicates at a specific location, is a duplicate of Xi-^ if Xi ^, Xi^ are close 
enough) 

— Assume that the component N(-\^ = Xi^^Q = Qi^ is added first at the 
location Xi^. Then, more duplicates Xi- of Xi^ are found and corresponding 
components A^(-|^ = Xi^,Qi.) are added to the iterLap approximation. 

is calculated by: 


Qij — j 


( 21 ) 


with a user-defined factor Kb 

We call this multiplicative scaling which can be used in conjunction with the 
usual Hessian scaling. In that case, only the first Hessian Qi^ is scaled with 
Ka- However, usually either Hessian scaling or multiplicative scaling is used. 
The approximation for Example with multiplicative scaling (ridup = 3 and 
Kb = 1.25) is shown in Figure [4b| 







Other modifications 

There are some other modifications to iterLap, including: 


— The functionality of manually adding optimisation starting points to Xg and 
explored points to X. Generally, when a new normal component with mean 
H is added to the iterLap approximation, aside from the starting points and 
explored points proposed by iterLap, a user can manually add points relative 
to the mean /x. This is useful in the case a user knows something about the 
target density, e.g. the support or the variable correlation. For example, a 
user may add more starting points, checking if there is any unexplored mode 
in the interested support. Another potential usage of this functionality is 
when X can be decomposed into (xa,Xb) and the conditional expectation 
-l^xalxb(^a) is known by a function f(xt,) (this frequently occurs in Bayesian 
inference with conjugate prior). Hence, when a new component with mean 
H = (^a, Mb) is added, it may be worth to check the locations (/(/ib+db), Mb + 

6b). 

— In the quadratic programming step for estimating w, we use weighted least 
squares: 


w = a,rgmin[{y — Zb)"'" {u}xl){y — Zb)], b>0. (22) 

b 

where uJx.,i is a weight for a explored point Xi in X. It is designed that a user 
can adjust the weight ujx-^i for a location of a component mean y = Xi. 

— Make the code more robust to computer numerical issues, e.g. derive the 
matrix Z from a normalised log version, make the Hessian matrix of gb-n^^ix) 
positive definite and scale the parameters in the quadratic programming 
step. 

The iterLap version with all above modifications is named mod-iterLap and 
will be compared the original iterLap by some examples in the next section. In 
each example, both versions are run with a predehned maximum number of com¬ 
ponents Uc-max but the resulting approximations may have less components than 
nc;max due to Stopping rules. Furthermore, in mod-iterLap, we use a simplifica¬ 
tion step to remove insignihcant components, e.g. components with normalised 
weights < e~^. This simplification reduces the computation cost when the 
approximation is used for other purposes. 

4 Comparison 

Firstly, we consider a density with a non-linear dependency in two-dimensional 
space. In this section, notations qo;xi‘)j Xo-^xi.') are the approximated densities of 
the original iterLap and qm.,x{ ), rm;a;(') are for the modihed iterLap. 

Example 2. Define a target density Px{‘) on a; = {xa,Xb). 

Px{x) = N{xa\pi = 0, cr^ = 10'^)N{xb\y = 0.03(a;a - 3)^ -|-5 ,ct^ = I^). (23) 



Both approximations are run with the maximum number of components 
nc;max = 50. To-xi’) stops with Tic = H Components while has ric = 27 

components. The contours of the standardised densities are shown in Figure 
Clearly, the modified version has a better capture of the non-linear dependency. 



(a) iterLap (b) mod-iterLap 

Fig. 5: Examplej^ the blue and black contours are the target standardised density 
ra;(-) and the approximated standardised density respectively. The red 

crosses are the component means. 


Figure shows the variable correlation but not the approximation error. So, 
to visually compare two approximations, rx{-) is evaluated in a grid and then 
sorted in decreasing order of rx{‘). The approximated r.-xi') is sorted by the same 
order. Both of them are then plotted in Figure]^ We also calculate the statistic of 
Equation 19 for two approximations: s(rx,ro-,x) = 0.424 and s{rx,rm\x) = 0.078. 
Running times in R language are included in the standardised density plots in 
this section. 

The next example is for testing a density with non-linearity and multi¬ 
modality. 


Example 3. Define a target density Px{‘) on x = (xo,Xh): 

Px{x) = 0.5N{xa\^i = -1, = 6)N{xb\^i = -0.5{xa + 1)" + 3, = 2) (24) 

-f 0.5N{xa\fi = 1, cr^ = 6)iV(x&|/r = 0.5(xa - 1)^ - 3, cr^ = 2). 


With nc;max = 100, iterLap stops with To-^xi’) of ric = 12 components 
while mod-iterLap has rrn;x(') with ric = 56 components. The contour plots 
and ordered standardised density plots are shown in Figure and Figure 
sirxjTo^x) = 0.602 and s{rx,r^-x) = 0.066. 

In Example]^ we try increasing the dimension of the parameter space. 






standardized density 


0 . 003 - 


0.003 - 




(a) iterLap: 0.222 seconds (b) mod-iterLap: 1.142 seconds 

Fig. 6: Examplethe plots of the ordered standardised densities. The blue and 
black curves are rx(-) and respectively. 



(a) iterLap (b) mod-iterLap 

Fig. 7: Example]^ the blue and black contours are the target standardised density 
rx{‘) and the approximated standardised density r.-xi') respectively. The red 
crosses are the component means. 


Example 4- Define a target density p 2 :(-) ona; = {xa,Xb,Xc) (dim(a;a) = dim(a;t,) = 
1, dim(a:c) = 4): 


Pxi^x') — A^(Xa|/ia, 

N{Xb\p,b = Mxa - fj-a) +b,Eb = al-I) 

N {Xc\pc — C{xi P’a: Xb ; 


( 25 ) 










standardized density 



(a) iterLap: 0.300 seconds (b) mod-iterLap: 5.090 seconds 

Fig. 8: Examplethe plots of the ordered standardised densities. The blue and 
black curves are rx(-) and respectively. 


with: 


/ia = -0.5, al = 6.0, 

A = -2.0, 6 = -1.0, 


al = 0.2, C = 

al = (0.6,0.7,0.8,0.9)/3. 


/ 0.9 0.3 \ 
-0.3 -1.1 
-0.5 -0.6 ’ 

\ 0.3 0.2 / 


(26) 


Notice that there is a linear dependency of Xa, Xb and a non-linear depen¬ 
dency of Xa, Xb and Xc- The conditional variances and define the de¬ 
pendency strength. As it is impossible to visualise this density, we only show 
the ordered standardised densities in Figure § With Ticjmax — 200, has 

Uc = 17 components and rm-,xi‘) has ric = 73 components. s{rj;,ro-x) = 0.733 
and s{roo,rm-x) = 0.115. 

So, the functional approximation method still works with dim(a:) = 6. We 
further increase the dimension to see its performance with Example 
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(a) iterLap: 1.480 seconds 
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(b) mod-iterLap: 122.934 seconds 


Fig. 9: Example]^ the plots of the ordered standardised densities. The blue and 
black curves are rx{-) and respectively. 


Example 5. The same target density form in Equation 25 is used with dim(a;a) = 
dim(a:;,) = 2, dim(a:c) = 5 and following parameters: 


Ma = (-0.5,-1.0), 


A = 


0.5 -1.2 
-2.9 -1.3 


rT^ = (6.0,7.0), 

6 = (-1.0,-1.5), 


(27) 


a^ = (0.2,0.3), 

al = (0.8,0.9,1.0,l.l,1.2)/4. 


C = 


/ 0.9 -1.3-0.3 0.8\ 
-0.7 0.8 -0.10.6 
0.7 -0.6 1.4 1.5 
1.2 -1.2 0.3 0.0 
\ 1.3 1.4 1.4 0.0/ 


The iterLap and mod-iterLap approximations are first run with ric-^max = 200. 
Co;a;(-) and 7m,a;x(’) have tlc = 20 and ric = 50 components respectively and are 
plotted in Figures 10a and 10b Even though rm,a-,x{-) is better, it may not be 


very satisfying. Hence, we run mod-iterLap with more components and obtain 
rm,b\x{‘) with Uc = 237 components and rm,c-x(') with ric = 345 components 
in Figures 10c and lOd The approximations do get better with 
0.763, sItj;, r. 


c) = 0.522, s(rx,rm,b;x) = 0.295 and s(rx,r, 


) = 

xi < 7 n,c\x) — 0.244 but 


approximation errors are not as good as ones of previous examples. 

So, like many other methods, iterLap suffers from the curse of dimensionality, 
especially when there is non-linear dependency between many variables. 

In the last example, we will see how iterLap works with a non-normal noise 
and a constrained variable space. When used directly on such a constrained 
space, iterLap may have numerical issues in the optimisation and the Hessian 
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(a) iterLap: ric = 20 (4.441 seconds) 


(b) mod-iterLap: ric = 50 (145.620 seconds) 
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(c) mod-iterLap: ric = 237 (1.5 hours) 


(d) mod-iterLap: ric = 345 (7.0 hours) 


Fig. 10: Examplej^ the plots of the ordered standardised densities. The blue and 
black curves are rx{-) and respectively. 


evaluation at locations along the constrained border. Hence, it is better to trans¬ 
form a constrained space to a non-constrained space. 


Example 6. Consider a DLM: 


yt = xt+vt {t = l:n), 

xt = xt-i+ut (t = 2:n), 


(28) 

(29) 


where Ut ~ N{0,a'^ = A„^), Vt ~ A^(0, cr^ = A„^). The priors for precision 
parameters are Xu Gamma{a = 1, 6 = 0.5), A„ ~ Gamma{c = 1, d = 0.5). 











Notice that x = xi:n is an intrinsic Gaussian distribution with a precision 
matrix: 


Qx - 


/ 1 -1 0 0 ...\ 

-1 2 -1 0 ... 

0 -1 2 -1 ... 


V 


(30) 


/ 


One hundred data points yi-n (n = 100) are generated from Equation 28 with 
A* = 1/2^, A* = 1/1^. The joint posterior of (A„,A„,a;) is: 


P\^,K,x\y{^u,^v,x) oc Gamma{Xu\a,b)Gamma{Xv\c,d) 
N{x\0,Q^)N{y\x,Qy = A„4). 

It can be seen that the full conditional posterior of x is: 

Px\\^,\^,y{x) = N{x\n'^,Q'^), 

with: 


(31) 


(32) 


Q^x=Qx+Qy, (33) 

Q'xPx = QyV- (34) 

Using Equation |32| to marginalise Equation |3 1 1 with respect to cc, we can obtain 
the marginal density P\^,\^\y{Xu, A„): 

PA„,A„|y('^uj'^n) OC y\^^\^\y{Xu, Xy) (35) 

= Gamma{Xu\a, b)Gamma{Xv\c, d) 
(27r)-("-i)/^A("-i)/^|Q;|-i/2|Q,r/2 
^ \-y^Qyy + Px^QxPx' 


The log transform is used on (A„, A„) to get r„ = log(A„) and Ty = log(A„), 
which are non-constrained variables. The corresponding non-normalised marginal 
density (r^, t„) is: 

."^1! |y ) — ^A^t,Ay |y (-^ti55 (30) 


which is then approximated by iterLap. Finally, the approximated density 


is converted back to qx^,\^\y{Xu, Xy) by Equation 36 


Two versions, iterLap and mod-iterLap, are run with nc-,max = 30. The con¬ 
tour plots of standardised densities of (t„,t„) and (A„,A„) are shown in Figure 

m 


The ordered standardised densities are plotted in Figurej^ In the parametri- 
sation {Ty,Ty), s(r,F q) = 0.125, s{r,rm) = 0.03 while in the parametrisation 
(A„, A„), s(r,ro) = 0.135, s{r,rm) = 0.032. 

Instead of transforming the densities back and forth between constrained 
space and non-constrained space, which involves the evaluation of a Jacobian, 
it may be more practical to work directly on the non-constrained space in some 
cases, e.g. specify the prior and approximate the posterior on (t„, r„) in Example 

El 












(a) iterLap: (ru,T„) 


(b) iterLap: (A„,A„) 



(c) mod-iterLap: (r„,r„) (d) mod-iterLap: (A„,A„) 

Fig. 11: Example the blue and black contours are the target standardised 
density r and the approximated standardised density r respectively. The red 
crosses are the iterLap component means (in the parametrisation (T„,r„) or 
the corresponding transform (A„,A^) of component means (tu,t^). iterLap has 
ric = 6 components and mod-iterLap has ric = 19. 


5 Conclusion 

We have proposed a new solution to iterLap approximation with various im¬ 
plementation modihcations such as stopping rule adjustment, proposal of new 
residual function, starting point selection for numerical optimisation, scaling of 
Hessian matrix. In all examples of Section]^ mod-iterLap achieves better per¬ 
formance with longer running time. This computation cost is reasonable as the 
mod-iterLap add more components to correct the approximation without getting 
stuck like the original iterLap. The more the number of components, the longer 
the running time. In practice, the trade-off between correctness and running 
time can be controlled by the maximum number of components ric^max ■ Another 











standardized density standardized density 



index 


index 


(a) iterLap: (tu,Tv) (0.498 seconds) 


(b) iterLap: (A„,Ai,) (0.498 seconds) 




index 


5000 

index 


(c) mod-iterLap: (r„,T«) (5.024 seconds) 


(d) mod-iterLap: (Au,A„) (5.024 seconds) 


Fig. 12: Example]^ the plots of the ordered standardised densities. The blue and 
black curves are rx{-) and respectively. 


point is that the code of all these examples, iterLap, mod-iterLap is written in 
R. Hence, the running time should improve significantly if the code is ported to 
C language. 

Such a functional approximation provides a fast approximation to any tar¬ 
get density without relying on sampling which is another difficult and complex 
problem. For such sampling problems, iterLap can be used as a non-linear multi¬ 
modal sampling proposal in Monte Carlo methods, providing an efficient way to 
explore parameter space [7]. This is also the direction our future work. 
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